An Efficient Algorithm for Data Cleaning of Log File using File Extensions

نویسندگان

  • Surbhi Anand
  • Rinkle Rani Aggarwal
چکیده

World Wide Web is a monolithic repository of web pages that provides the Internet users with heaps of information. With the growth in number and complexity of Websites, the size of web has become massively large. Web Usage Mining is a division of web mining that involves application of mining techniques to web server logs in order to extract the behavior of users. A Web Usage Mining process comprises of three phases: data preprocessing, patterns discovery and pattern analysis. Data preprocessing tasks are carried out former to the application of mining algorithms. Preprocessing enables to translate the unprocessed data which is composed from server log files into constructive data abstraction. The appropriate analysis of a web server log proves to be beneficiary to manage the websites efficiently from the administrative and users' prospective. Preprocessing results also strongly influences the later phases of Web Usage Mining. This makes the preprocessing of server log files a significant step in Web Usage Mining. This paper emphasizes on the Web Usage Mining process and makes an exploration in the field of data cleaning.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Suspend-aware Segment Cleaning in Log-structured File System

The suspend feature of the modern smart device practically suppresses the background segment cleaning of the log-structured file system. In this work, we develop Suspend-aware Segment Cleaning for the log-structured file system. We seamlessly integrate the segment cleaning into the suspend module of the smartphone OS so that the log-structured file system can reclaim the free segments without i...

متن کامل

Ex Vivo Comparison of File Fracture and File Deformation in Canals with Moderate Curvature: Neolix Rotary System versus Manual K-files

Background and Aim: Cleaning and shaping is one of the important steps in endodontic treatment, which has an important role in root canal treatment outcome. This study evaluated the rate of file fracture and file deformation in Neolix rotary system and K-files in shaping of the mesiobuccal canal of maxillary first molars with moderate curvature.    Materials and Methods: In this ex vivo exp...

متن کامل

An Efficient Data Replication Strategy in Large-Scale Data Grid Environments Based on Availability and Popularity

The data grid technology, which uses the scale of the Internet to solve storage limitation for the huge amount of data, has become one of the hot research topics. Recently, data replication strategies have been widely employed in distributed environment to copy frequently accessed data in suitable sites. The primary purposes are shortening distance of file transmission and achieving files from ...

متن کامل

Comparison of Cleaning Efficacy and Instrumentation Time of Reciproc and Mtwo Rotary Systems in Primary Molars

Background and Aim: Pulpectomy of primary teeth is commonly performed with hand files and instruments. However, it is a time consuming procedure. Compared to hand files, rotary instrumentation has more advantages. The purpose of this in vitro study was to compare the cleaning efficacy and time taken for instrumentation of deciduous mo-lars using Reciproc and Mtwo rotary systems. Materials an...

متن کامل

Heuristic Cleaning Algorithms in Log-Structured File Systems

Research results show that while LogStructured File Systems (LFS) offer the potential for dramatically improved file system performance, the cleaner can seriously degrade performance, by as much as 40% in transaction processing workloads [9]. Our goal is to examine trace data from live file systems and use those to derive simple heuristics that will permit the cleaner to run without interfering...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012